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This paper presents a unified account of the theory of least squares and its adaptations to statis- 
tical models more complicated tlian the classical one. First comes a development of the properties 
of weak generalized matrix inverses, a useful variant of the more familiar pseudo-inverse. These 
properties are employed in a proof of the usual Gauss theorem, and in analyzing the case in which 
known linear restraints are obeyed by the parameters. Another situation treated is that of a singular 
variance-covariance matrix for the observations. Applications include the case of equi-correlated 
variables (including estimation despite ignorance of the correlation), linear "restraints" subject to 
random error, and stepwise Hnear estimation. 



1. Introduction and Summary 

The aim of this paper is to present a unified account 
of the theory of least squares, and in particular to de- 
scribe the necessary modifications when the customary 
statistical model is complicated in certain ways re- 
quired for greater realism. The paper contains (prob- 
ably) new results, (probably) new proofs of known re- 
sults, and an (almost certainly) new overall treatment 
of the subject. Our hesitancy to make stronger claims 
arises because many of the theorems associated with 
least squares are part of the "folk-lore" of the field, 
and because the relevant literature is growing rapidly 
and much of it is "disguised" in the context of other 
branches of mathematics or science. The most closely 
related paper of which we are aware of is that of Rao 
[1962]; our work was done independently of his. (The 
relevance of the very recent paper of Chipman and 

Rao [1964], which contains other references of interest, 
is detailed at the end of section 5.2.) Valuable sum- 
maries of various aspects of the theory of least squares 
can be found in Deming [1943], Plackett [1949, 1960], 
Rao [1946], and Scheffe [1959]. 

The foundation of least-squares estimation theory 
is the well-known Gauss ^ theorem which can be proved 
in a number of ways, e.g., by linear vector space tech- 
niques as in Scheffe {op. cit) or by the method of La- 
grange multipliers as in Plackett [I960]. We shall 
present a proof suggested by the properties of gener- 
alized inverses of matrices, an idea motivated quite 
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naturally by the possible singularity of the coefficient 
matrix in the usual normal equations. It will be shown 
that any one of a wider class of matrices, which we 
call weak generalized inverses, can serve equally well. 
The properties of weak generalized inverses appear 
interesting in their own right; they are developed in 
section 2, are applied to the derivation of the Gauss 
theorem in section 3, and are involved implicitly or 
exphcitly throughout the rest of the paper as well. 

One compHcation of the customary statistical 
model which often arises in practice is the imposi- 
tion of known linear restraints on the parameters. 
In section 4 the Gauss theorem is extended to this 
case. For a careful analysis it is important to dis- 
tinguish clearly between artificial constraints (imposed 
to obtain unique solutions) and ''real" ones, and among 
the latter class to exploit the distinction between those 
constrained functions which were estimable before 
the restraints were imposed and those which are es- 
timable only by virtue of the restraints. 

Another frequent comphcation, the possibihty of a 
singular variance-covariance matrix for the observa- 
tions, is discussed in section 5. It is shown how this 
deviation from the "standard model" can be replaced 
by the adjunction of linear restraints, and vice versa. 
Models involving both kinds of comphcations are 
treated. AppUcations of the general theory are made 
to the case of equicorrelated variables (including the 
possibility of estimation in some cases despite ignor- 
ance of the correlation), and to the case of linear 
"restraints" subject to random error. The topic of 
stepwise linear estimation^ which has aroused con- 
siderable interest recently, is examined in section 
5.5(cf. Freund, Vail, and Clunies-Ross [1961], Gold- 
berger and Jockems [1961]). 

The style of the paper represents a compromise 
between (1) the desire to have it serve as a' useful 



151 



siatistical reference as well as a vehicle of research 
communication, and (2) the need to avoid a length and 
prolixity which surely would induce "battle fatigue" 
in readers and authors alike. On the one hand, 
additional information and "sidelights" appear 
throughout as corollaries and informal remarks. 
Also, the more familiar matrix techniques have been 
used in preference to vector space concepts, at the 
cost of some awkwardness at points where the "linear 
geometry" approach is the really natural one. Proofs 
have been written out in fairly full detail (except for 
matrix-algebraic steps). It is hoped that these 
policies make the paper more valuable and accessible 
to a wider range of readers. On the other hand, it 
has been necessary to presuppose a rather mature 
grasp of matrix theory and manipulations. A serious 
expository gap (which we hope some colleague will 
fill) is the omission of any discussion of computational 
methods for the calculations required in utilizing 
the theory, and also the absence of concrete and non- 
trivial numerical examples. Inclusion of such ma- 
terial, though desirable for completeness, would have 
interrupted the logical pattern of the theoretical 
development. 

It is a pleasure to acknowledge the many fruitful, 
often heated, but always stimulating discussions with 
J. M. Cameron (NBS Statistical Engineering Labora- 
tory) which have continued over many years. With- 
out his constant interest, this paper would never have 
been written. Colleagues at the Mathematics Re- 
search Center, whose helpful comments have in- 
fluenced the present version of the material, include 
H. Reinhardt and J. C. Boot. We also acknowledge 
with thanks a constructive reading of our paper by 
T. N. E. GreviUe. 

2. Weak Generalized Inverses 

In this section we define weak generalized inverses 
and develop some of their properties. Let Xhe a.pXn 
matrix. As a special case of what follows, we shall 
show that there exists an n X p matrix Z+ with the 
properties ^ 



(a) XX^X =X 

(b) X^XX^=X^ 

(c) (X^X)' =X^X 

(d) {xx+y =xx^. 



(2.1) 



The matrix X'^ is unique (this will not be proved in the 
present paper) and is called the generalized inverse 
of X. Further details on this topic can be found in 
the excellent review paper by GreviUe [1959]. Some- 
times X'^ is called a pseudo-inverse or a Moore-Penrose 
inverse, the latter association referring to Moore [1935] 
who originally discovered its properties, and to Penrose 
[1953] who later rediscovered and developed them 
further. 

3 A superscript prime will always denote (vector or matrix) transposition; the original 
definitions involved the complex-conjugate transpose, but we deal only with real matrices. 



Our approach to this material is based on the fol- 
lowing lemmas whose proofs (although simple) are 
given for completeness. 

Lemma 1. Let A be a pXp symmetric matrix of 
rank q (q < p), and K a p X r matrix of rank r = p — q. 
Then there exists a pX r matrix H with the properties 



(a) H'A = 

(b) det (H'K) 7^ 

if and only if the square symmetric matrix 

A K 



(2.2) 



M = 



K' 







is nonsingular. In this case any H of rank r obeying 
(2.2a) can be used as the H in (2.2b). Furthermore, 
M ~ ^ has the form 



M-i = 



C H(K'H)-i" 

(H'K)-iH' 

Proof. First assume M nonsingular, and let 
~C C{\ 



M-' = 



Lc; 



where C is symmetric and p Xp, C2 is symmetric and 
rXr, and Ci is p X r. The multiplication^ MM-^=I 
implies 

AC-hKC[=L 

Now choose any pXr matrix H of rank r obeying (2.2a). 
Such matrices certainly exist. Premultiply the last 
equation by H' to obtain H'KQ = H'; since H' is of 
rank r, H'K must have rank ^ r (and thus exactly r 
since K has rank r), so that (2.2b) holds. To prove 
the converse, let H be any pX r matrix obeying (2.2). 
Then by (2.2b) H must have rank r = p — q, and from 
this and (2.2a) it follows that any pXp matrix B with 
H'B = has the form B=AD for some pXp matrix D. 
Now specialize to B = I - K{H' K)-^H' and use the 
resulting matrix D to define ^ 

C=II-H{K'H)-^K']D. 

If this matrix, together with C \ = H{K' H)~^ and C2 = 0, 
is substituted in the M~^ formula given above, then it 
is easily verified that MM~^ =/ so that Mis nonsingular 
and the proof is complete. 



* The symbol 1 will always denote an identity matrix of appropriate dimension. 

^ We remark that the matrix C can be written explicitly as C—[_I — H{K'H)~^K'^\_A 
+ KH']~\ One verification employs the properties (2.3b) and (3.12) of the "true C," whose 
existence is shown in Lemma 2, to check that using the indicated formula in the upper left 
block in A/"' does in fact lead to MA/~' = /. Another formula not requiring knowledge of 
H, and verifiable using (3.12b) and its consequence (A +KK')~^K = H{K'Hy\ is 
C = (A + KK')-'-iA + KK')-'KK'(A-^KK')-y 
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Lemma 2. Let A, K, M be as in Lemma 1 and as- 
sume M nonsingular. Then there is a unique p X p 
symmetric matrix C associated to K, with the property 
that for at least one H obeying (2.2), 



(a) 
(b) 



K'C = 
AC = 



= 
I-K(H'K)-iH'. 



(2.3) 



Furthermore C obeys (2.3b) /or every H satisfying (2.2), 
and has the additional properties. 



(a) C = CAC, A = ACA 

(b) C is of rank q. 



(2.4) 



Proof. For any H obeying H'A=0 and 
dei{H'K) t^ 0, it is easily verified that any symmetric 
pX p matrix C satisfying (2.3) must be a block of M~^ 
placed as in the formula for M~^ given in Lemma 1. 
Furthermore, such a C does satisfy (2.3). Since M de- 
pends only on K (i.e., not on the choice of //), the same 
is true of M~^ and therefore of C. Since CK = 0, pre- 
multiplication of (2.3b) by C yields CAC = C, which 
imphes that the rank of C is at most that of A. Since 
HA = 0, postmultiplication of (2.3b) by A yields 
ACA=A, which implies that the rank of A is at most 
that of C. Thus (2.4) is proved. -^^ 

It is interesting to observe, from eqs (2.2) through 
(2.4), that the relationship between the pairs {A, H) 
and (C, K) is symmetric. Also, property (2.4a) shows 
that C enjoys properties (2.1a) and (2.1b) of A'^. Since 
A and C are symmetric, (2.1c) and (2. Id) read 



AC = CA. 



(2.5) 



This will certainly hold (by (2.3b)) if K = H, an allowable 
choice of K in accordance with (2.2) since dei{H'H) ¥" 
if H{p X r) is of rank r. Equation (2.5) will not hold in 
general,^ but we shall not require it and so can permit 
ourselves the freedom of choosing K different from H. 
The case q = p (i.e., A nonsingular) can be included 
by appropriate formal conventions concerning "vacu- 
ous blocks" in the block matrix M and its inverse; 
this will be assumed done wherever appropriate, the 
result (by (2.4a)) being of course C — A~^. 

The next lemma and its use in the following theorem 
are not strictly necessary for our purposes, but are 
included to round out the theory. 

Lemma 3. Let K be a symmetric p X p matrix. 
Then every symmetric p Xp matrix C related to A by 
(2.4a) arises from some K as above. 

Proof. Let q and r be as above, and let H be any 
pX r matrix of rank r such that H'A = 0. Let K(p X r) 
consist of r columns oil — AC in the same positions as 
r independent columns of H'. Since H'{I — AC) = H\ 
it follows that H'K is nonsingular and thus that K has 
rank r. Also since C{I — AC) — Q, it follows that 
CK = and therefore K'C = f). To verify (2.3b), first 



^^From the last formula in footnote 5, we see that C is determined by K only via KK' ; 
e.g. C is unchanged if K is replaced by some (p Xr) KL with LL' =/. 

^For a specific example in which eq (2.5) fails, take the rows of A to be (1, 0) and (0, 0), 
H' = (0, 1), K' - (1, 1); the rows of C are (1, - 1) and (- 1, 1). 



observe that (2.4a) imphes (2.4b), so that the equation 
C{I — AC) = proves I — AC to have rank not exceeding 
p — q = r. Thus the columns of I— AC not in K are 
linear combinations of the columns of K, i.e., we can 
write 

I-AC = KE 

for some rXp matrix E. Then 



that 



and therefore 



H' = H'{I-AC) = H'KE 



{H'K)-'H' = E 



I-AC = K{H'K)-'H\ 



completing the proof. 

Now let Z be a p X n matrix. A weak generalized 
inverse of J^ is an nXp matrix X' with the first three 
of properties (2.1), i.e.. 



(a) 



xx-x=x 



(b) x-xx-=x- 

(c) {X-X)'=X-X. 



(2.6) 



The following theorem, which characterizes the class 
of all weak generalized inverses of X^ in particular 
estabhshes the existence of at least one such inverse. 
Theorem. Let X 6e a p X n matrix. The n X p 
matrix X~ is a weak generalized inverse of X, if and 
only if 

\- = \'C 

for some C associated to A = XX' as in Lemma 2. 

Proof. First suppose X~=X'C with C associated 
to A as above. Property (2.6a) reads ACX^X, and 
follows from (2.3b) upon noting that H'AH = implies ^ 
H'X = 0. Property (2.6b) reads X'CAC = X'C and 
follows from (2.4a), while (2.6c) asserts the symmetry 
of X'CX and is a consequence of the symmetry of C. 
To prove the converse, assume X~ is any nXp matrix 
obeying (2.6). By (2.6b) and (2.6c), 

x-={x-x)x-=x'[(x-yx-]. 

By Lemma 3, it suffices to prove that C = {X~)'X~ obeys 
(2.4a). This follows from 

CAC = {x-yx-x{x-x)'x- = (x-yx-xx-xx- 
= {x-yx-xx- - {X-yx- = c, 

ACA =X{X-X)'X-XX' =XX-XX-XX' 
=XX-XX'=XX'=A. 



^ The last equation implies that the sum of the squares of the entries in each row of 
H'X vanishes. 
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We note in passing that with X —X'C^ X obeys 
(2.1d) if and only if eq (2.5) holds. Thus (2.1d) holds 
if the choice K = H is made (this completes the proof 
that X has a generalized inverse Z+), but as mentioned 
earlier we shall not have to impose this requirement. 

In what follows the notations //, K, C, X~ will have 
the same significance as in this section, A will stand 
for XX\ and the notation X'^ will be reserved for the 
generalized inverse. Note that X~ and C need not be 
uniquely determined by X (although X^ is), but depend 
on the choice of K. The relations 



ACX=X 



H'X = 



(2.7) 
(2.8) 



obtained in connection with the last proof are recorded 
here for subsequent reference. 

3. Fundamental Gauss Theorem for Linear 
Estimation 

The methods of least squares have been in use now 
for over 150 years. Gauss [1873] in 1821 (collected 
works 1873) is now credited with placing the method 
on a sound theoretical basis without any assumptions 
that the random variables follow a normal distribution. 
Gauss's contribution was for a time neglected until 
Markov [1912] "rediscovered" the work of Gauss. It 
should be noted that Legendre [1806] in 1806 was the 
first to publish the method of least squares, although 
apparently Gauss had known about it some years pre- 
vious. For a more detailed historical introduction 
consuh Merriman [1877], Plackett [1949], and Eisen- 
hart [1964]. 

In this section we apply the properties of the weak 
generalized inverse to obtain a proof of the Gauss 
theorem. The relevance of the generalized inverse to 
the theory of least squares has been noted by Bjerham- 
mar [1951], Greville [I960], and Penrose [1956]. The 
fundamental result used in these papers is that for 
an over-determined system of linear equations 



y = {yuy2, 



., yn),b' = {bu bz, . . ., bp), 



the selection of b which minimizes the sum of squares 
of residuals, (y — X'b)'{y — X'b), is given by 

where {X')'^ is the generalized inverse of X' . It is 
easily verified using (2.1) that {X')^ = {X^)\ so that by 
(2.1a) and (2.1c) 

Ab=XX'{X^yy = X{X+Xyy=iXX+X)y = Xy 

i.e., b must be a solution of the usual normal equations 
Ab=Xy of least-squares theory. More recently Rao 
[1962] has used property (2.4a) to demonstrate some of 
the well-known results associated with minimum 
variance linear unbiased estimation. 



Before stating the theorem we review the central 
idea of an estimable parameter, cf. Rose [1944]. Let 
F'=(Fi, . . ., Yn) be a vector of random variables 
having a distribution which depends on a parameter 8. 
A function g{Y) of the random vector Y is called an 
unbiased estimate of 6 if E [g{Y)] = 6, for all values of 
0, where this last phrase may reflect limitations on the 
possible values of 6 imposed by the problem at hand. 
The parameter 6 is called estimable if it has at least 
one unbiased estimate of some form prescribed by the 
context. In this paper we deal only with linear 
estimates 

g(Y) = d'Y+c 

where d' = (c?i, . . ., dn) is a IX n vector and c is a 
scalar. A best (unbiased linear) estimate of 6 is one 
which has minimum variance among the class of un- 
biased linear estimates of 6. 

Theorem 1: (Gauss). Let X be a pXn matrix 
(p ^ n) of known constants having rank q, ^ a p X 1 
vector of unknown parameters, and Y an n X 1 vector 
of random variables such that ^ 



E(Y)-X'i8 
var(Y)-o-2I. 



(3.1) 



The minimum variance unbiased linear estimate of any 
estimable linear function O^TjS of (3 {where I is a 
p X 1 vector) is 

e = T{x-yY = rcxY. 

For all such 0, 6 can be obtained as Tp where P 
{independent of I) is any vector minimizing the quad- 
ratic form (Y — X'/3)'(Y — X'/3), or equivalently is any 
solution of the normal equations 



Ap =XY {A=XX') 
whose general solution can be written 
p =CXY-h(I-CA)z 
with z an arbitrary p X 1 vector. 



(3.2) 



(3.3) 



Proof: Let d be an /z X 1 vector. Then we remark 
that the unbiased linear estimates of = 1' jS are 
precisely the linear forms d'Y with d obeying 



Xd = l, 



(3.4) 



so that 6 is estimable if and only if (3.4) has a solution 
d. Indeed, the function d'Y-\-c is an unbiased esti- 
mate of 6 if and only if, for all values of 6, 

l'fi = e = E{d'Y + c) 

= d'E{Y)-^c = d'X'j3 + c. 



" If Y' ={Yu . . ., Y„). then E{Y) is the vector with E{Yk) as A:th component, and var(y) 
is the n Xn matrix with C()v(F,,F/) as (i,7lth entry. 
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Whether 1 = (so that is the only value of 6) or 
/ 7^ (so that 6 assumes all real values), this will be 
true if and only if c = and d obeys the system (3.4). 
The key idea is to seek a linear change of (random) 
variable from F to ^ =B'Y, where B Is dn riX p matrix 
so chosen that for each estimable 6 = 1'^, at least one 
unbiased linear estimate of can be written in the 
form /'p . That is, for at least one di obeying (3.4) the 
identity 

rB'Y=l'p =di'Y 

is to hold, or equivalently ^ di = Bl. For any pair of 
vectors / and d related by (3.4) we would have 

Xd = l=Xdi=XBl=XBXd, 

and since every d is related to some / by (3.4) (just 
define / by (3.4)) the equality between the end terms 
of the last display is an identity in d. This shows that 
B must be chosen to obey 



X=XBX. 



(3.5) 



Conversely if (3.5) holds then for each d and / related 
by (3.4) we can set di = BXd so that 

Bl = BXd = di and Xd,=XBXd = Xd = l 

as desired. Therefore (3.5) is exactly the desired 
relationship, and its resemblance to (2.6a) suggests our 
setting B = X~ so that 

d,=X~l = X'CL 

The variance of § = d'Y is \'dr{§ ) = (d'd)cr'^, which is 
to be minimized by a proper choice of (^/subject to (3.4). 
Define an unknown ^Xl vector 8 by d = di-\-8, so that 

d'd = di 'di ^di'8-\-8'd[+8'd. 

However, since d[ satisfies (3.4) we have 

d,'3 = l'CX{d-X'Cl) = l'Cl-rCACl = 0, 

so that 

war{e) = {di'di-h8'd)a\ 

which is minimized if and only if 6 = 0, i.e., d=di. 
(Incidentally this shows di independent of the choice 
B=X~.) Thus the unique "best estimate" is 

" = di'Y=l'CXY (3.6) 

and its variance is 

var {§) = {di'di)(j^ = {['CXX'CDa' = l'Cla\ (3.7) 

We have shown that I' p is a best estimate of 6 if and 
only if 

l'P=l'CXY, 



which, since l=Xdi =ACl, is equivalent to 

l'C{Af^-XY) = 0. (3.8) 

This shows that any solution /} of the normal equation 
Ap =XF yields a best estimate I'/l of ^. Conversely ^^ 
if (3.8) is to hold for all estimable ^ = /'^ (i.e. for all / 
such that (3.4) has a solution d), then since every d is 
related to some / by (3.4) we have 

d'X'CAp =d'X'CXY 

as an identity in d, so that X'CAp =X'CXY and 
premultiplication by X (together with (2.4a) and 
(2.7)) shows that ft must be a solution of the normal 
equations. 

Since CXY is a solution of the normal equations, the 
general solution can be written 

p =CXY-\-7) 

where iq is an arbitrary p X 1 vector such that Ar) = 0. 
For any p X 1 vector z, 

7) = {I-CA)z 

satisfies this condition by (2.4a), while conversely any 
7} obeying y4 17 = has the form {I — CA)z with 2 = 17. 

It only remains to show that the solutions /5 of the 
normal equations are precisely the vectors /3 which 
minimize the quadratic form 

Q = (Y-X'^nY-X'^). 

For this purpose set I3 = P -\-8 and observe 
X{Y-X'P) = 0, so that 

Q = {Y-X'f^ y{Y-X'p)-^iX'dy(X'8) 

^{Y-X'ft)'{Y-X'P) 

where equality holds if and only if Z'8 = and thus ^^" 
if and only if ^8 = 0, i.e., if and only if )8 (as well as 
P ) satisfies the normal equations. 

The preceding analysis essentially contains the de- 
scription of the class of estimable functions. We 
rephrase this in the following corollary. ^^ 

Corollary 1.1: The parametric function 6 = Tfi 
is estimable if and only if 



or equivalently 



{I-AC)l = 



7/7 = 0. 



(3.9; 



Proof. We know that is estimable if and only if 
there exists a vector di=X'Cl with Xdi = L Substi- 



^ We assume for this motivation that the distrihution of Y is not concentrated on some 
lower dimensional subset of n-dimensional space. 



'" This converse, which makes the role of the normal equations precise, was not explicitly 
stated as part of the theorem. 

'0° Clearly ^'8 = imphes A8 = XX'8=^0; conversely ^^'8 = implies {X'8}'(X'8) = 
and thus A''8 = 0. 

" We again remind the reader that H and K are assumed chosen as in section 2, i.e., 
obeying (2.2). 
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tuting (2.3b) yields 

l-Xdi={I-AC)l = ^ = K[H'K)-'H'l 

which imphes 

7/7 = 

as desired. Conversely if (/ — ^C)/ = then 

l = ACl = Xdi {di=X'Cl) 

and if //7 - then by (2.3b), {I-AC)l = as well. 

We observe in particular that the components of /3 
are best estimates of the corresponding components of 
(3 if and only if these components are in fact estimable; 
by (3.4) this requires that every unit p X 1 vector, and 
thus every pXl vector, be a linear combination of the 
columns of Z. In other words "^ is an estimate of /3" 
makes sense only in the special case q=p, when 
C = A-\ 

The next corollary pertains to finding a solution of 
the normal equations by adjoining "dummy" quanti- 
ties to obtain a system of full rank. 



Corollary 1.2: Let \ and m be rXl vectors of 
constants where m is arbitrary. Then the unique 
solution of the system of (p + r) simultaneous linear 
equations 

TA VI Vol rvv"! 

(3.10) 

yields a solution of the normal equations; the same 
holds for the unique solution of the system 



A K 
_K' 0. 






= 


XY 



(A + HK')^=XY + Hm. 



(3.11) 



Proof. The system (3.10) is of full rank since 
its coefficient matrix is the M of Lemma 1 in section 2. 
Therefore the solution can be written 



C H{K'H)-n [XYl 

{H'K)-'H' oj [m \ 

CXY-\-H{K'H)-h 
{H'K)-'H'XY 



(3.11a) 



However since H'X = 0, the vector k is identically 
zero. Since K has rank r, m can be written as K'z 
where z is a p X 1 vector; then the solution vector p is 
ft = CXY i- {I — CA)z which is the general solution of 
the normal equations. It also follows that^ the fi of 
(3.10)'s solution satisfies (3.11), since K'/l =m. It 
only remains to prove that (3.11)'s solution is unique, 
i.e., that A-\-HK' is nonsingular. This is true since it 
can be directly verified using (2.3b), (2.3a), and (2.8) 
that 



{A^HKT' = C^[H{K'H)- 



CH]{H'H)-'H\ 

(3.12) 



For situations in which a suitable K is known but a 
suitable H is not at hand, it may be desirable to re- 
place (3.11) by an analogous system not involving H. 



Such a system is given by 

{A^KK')/I^XY^Km, 



(3.12a) 



which is satisfied by the P of (3.10)'s solution, and 
which has only one solution since 

{A^KK')-' = C-^H(K'H)-'{H'K)-'H' (3.12b) 

as can be directly verified using (2.3). Lacking H, one 
might still want to know C in order to check estima- 
bility by (3.9). 

From the criterion (3.9) and the fact that K is of 
rank r, it follows that the elements of K' ^ are an 
independent set of nonestimable functions with the 
additional property that [^, K\ has independent rows. 
The analysis of (3.10), together with the Gauss theorem, 
shows that the values of these nonestimable linear 
functions can be prescribed in any way (i.e., K'P = m) 
without affecting the best estimates of the estimable 
functions; ft depends on m but — l'ft (where 6 = 1 ji 
is estimable) does not. The results of prescribing (in 
a self-consistent way) the values of an arbitrary set of 
linear forms in /3 are treated in section 4. 

It is natural to inquire as to the significance of 
/'/3 when 0=^Tf3 is not necessarily estimable. One 
form of the answer is given in the next corollary. 

Corollary 1.3: Let ft =CXY. Then for any 
= /'/3, there is a unique ^^^ estimable function 6i = /jj8, 
namely 6\=TCAl3, such that ip is the best estimate 
ofOu 

Proof. First assume li=ACl and 6i = l[l3, so that 
^1 is estimable by Corollary 1.1. Then the Gauss 
theorem implies that /'/3 is the best estimate of ^i, 
since by (2.4) 

r^=TCXY=l'CACXY 

^{ACiyCXY=l[f3. 

To prove the uniqueness of 6i, consider any es- 
timable 9i = Til3 such that Tft is the best estimate of 
^1. Then by Corollary 1.1 H'li = 0, so that li=A7] for 
some p X 1 vector t). Also we must have 

l'CXY=l'ft =TJ =7]'ACXY=r]'XY, 

so that X'Cl = X'y] and therefore 

l,=XX'y]=XX'Cl=ACl 



as asserted. 

For completeness we include some 
about the vector of residuals 



facts 



b = Y-X'p 
(where /3 is any solution of the normal equations ^^) and 



"" The uniqueness assertion requires the assumption mentioned in footnote 9. Note 
that a definite choice of C is assumed. 



•^ Note that 8 is independent of the choice of /8 . 
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the usefulness in estimating cr^ of its squared length 
(the residual sum of squares) 

S^' = 8'8 = {Y-X'py{Y-X'p) = Y'Y-/rAfi. 

Corollary 1.4: The residual vector is uncor- 
related with the estimate of any estimable function; in 
fact i2« 

Cov(^,8) = 0. 

Furthermore we have 

E(S2) = (n-q)o-2. 

Proof. The expected value of the residual 
vector is 

E{b) = E{Y)-E{X'h-0. 
Therefore 

Cov(/1,6)-E|/5 6'}-£'{£'(/5')8'} 

= E{iCXY][Y-X'I^Y}=CXE{YY'){I-X'CX). 



completing the proof. 

A final comment deals with the maximum possible 
number of linearly independent nonestimable para- 
metric functions 6 = 1'^. If q=p (i.e., A is non- 
singular) then this number is zero; every linear func- 
tion of (3 is estimable since j8 itself is estimable 
(see the remarks after Corollary 1.1). If q < p, 
however, then the number is p rather than r (as is 
occasionally suggested). This can be seen by par- 
titioning A^l^Ai, A2], where Ai consists of q inde- 
pendent columns of A. Then the pXp matrix [//, Ai] 
is nonsingular, since H'H and AiAi are nonsingular 
and 



Sir 



have 



E{YY') = cT^I^{X'P){X'f3y 



Cov{P,8) = CX[a'I^X'pf3'X][I-X'CX] 
= C[cr'I^App'][X-ACX]=0 
because X = ACX from (2.7). 



To prove the second assertion we use the general 
formula 

EiY'BY) = E{Y')BE{Y) + trace {B var (F)) 

for the mean of a quadratic form, with 

B-=^{l-X'CXf^I-X'CX. 

to obtain 

E{S^) = E{8'^) 

= I3'X(1-X'CX)X'P^ct' trace {I-X'CX) 
= 0-2 trace (/-Z'CZ) 

- o-2[ai- trace (Z'CX)]. 

By the general formula trace (M1M2) == trace {M^Mx) 
where M\ and Mz are rectangular matrices of the same 
dimensions, we have 

trace {X'CX) = trace {XX' C) = trace {AC) 

= iT3ice{I-K{H'K)-'H') 

= p- trace [{H'K){H'K)-'] 

= p-r = q, 



{H'H) 

{A[Ai)- 



1//' 

'A[ 



[H,Ai] = I. 



Therefore, if A is a column of H and ai, «£, 
are the columns of ^^i, then 



N=[H, a^-\-h, cx2-\-h, . . 



h] 



'^" By definition, (;<.v(/i, 8) is the matrix ^{[/S -^■(yS )][8-£'(8)]'}. 



has the same determinant as [//, Ai] and so is also 
nonsingular. The p columns of A^ are therefore the 
vectors "V of coefficients of p independent parametric 
functions, which are all nonestimable since the non- 
singularity oi H'H implies that no column of 

H'1V= [H'H, H'h, H'h, . . . , H'h] 

is the zero vector. 



4. Gauss Theorem With Given Restraints 

Often experimental situations arise in which the 
parameters (components of p) are connected by 
known linear relations. It is not generally realized 
that some of the linear forms whose values are pre- 
scribed by these given restraints may be estimable with 
respect to the equations of condition E{Y) = X'P 
where as before we assume X is pX n {p "^ n) and of 
rank q. In this section we discuss the appropriate ex- 
tension of the Gauss theorem when these equations of 
condition are supplemented by known linear con- 
straints. It will be shown that several applications of 
the "simple" Gauss theorem of section 3 suffice to 
reduce such problems to purely matrix-theoretic 
questions. 

We introduce the term pre-estimable to be used in 
this section for those parametric functions (linear in ^) 
which are estimable with respect to E{Y)=X'li. The 
term estimable will refer to the parametric functions 
which are estimable with all the given information in- 
cluding the restraints. Clearly every pre-estimable 
parameter is also estimable, but the converse need not 
hold; for example a nonpre-estimable function whose 
value is specified by one of the given constraints is 
obviously estimable. 

We will find it convenient to assume that the con- 
straints have been brought into an "irreducible form" 
in a sense made precise in this and the next few 
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paragraphs. Suppose the initially given restraints are 



L'I3-- 



where L' is kXp with rank k, and m is A;Xl. The 
matrix L can be partioned into L={Li, L2) where Li 
is pXsi {k = Si-\-S2) with rank s, such that L\'P = m\ is 
nonpre-estimable and Li'l^^fYii is pre-estimable; i.e., 
H'Li has no zero column and H'L2 = 0. Since H has 
rank r=p — q,we know that 52 ^ q. Furthermore from 
the remarks at the end of section 3, the maximum num- 
ber of linearly independent nonpre-estimable re- 
straints ^^ is p; hence Si ^ p. 

Let the rank of the rXsi matrix H'Li be v. There 
will then exist a Si X{si — v) matrix G with rank Si — v 
such that H'LiG = 0. Also there will exist a. SiXp 
matrix F with rank p such that F^G = 0. We can for 
example take F' to consist of v linearly independent 
rows of H'Li. Then the square matrix of order si, 
S = (F, G), has rank si. Hence we can premultiply the 
nonpre-estimable restraints Li'/3 by S' to obtain ^^ 



~F'~ 


L[I3 = 


F'L\li 





F'mi 


[g'\ 




[o'L[li\ 




[G'mJ 



Since H'LiG = 0, the (si — p) restraints G'L[f3 are pre- 
estimable. It is clear that the restraints F'Li/B are 
nonpre-estimable, as H'L\F has no zero column by vir- 
tue of F'G = 0. Thus the original k restraints L'P = fn 
may be regarded as being transformed into two sets 

K[l3 = m\, m\=F'fn\ 



K2P = m2, m2- 



G'm\ 
m2 



where Ki is pX ki with rank k\ such that 
K[=F'L[, ki = v 



Ki = 



G'L[ 



L' 



, k2 = {Si — v)+S2 



with H'K2 = 0. Furthermore the rank of the rX ki 
matrix H'Ki=H'LiF (ki < p) is ^^^ ki and hence the 
rank of Ki is also ki. 

When the given Si nonpre-estimable restraints LJj8 
are such that the rank v oi H'Li is Si (equal to the num- 
ber of nonpre-estimable restraints) then these restraints 
will be termed irreducible restraints. Alternatively if 
the rank p of H'Li is < 5i (smaller than the number of 
nonpre-estimable restraints) the restraints L[I3 will be 



'^ We will use the term "restraint" to refer to a constrained linear form as well as to the 
constraint equation itself. 

'■» Since Sis nonsingular, Lj/3==to, is logically equivalent to S'L'^ft^S'm^. 

'^^ For some nonsingular vxi' matrix U, we have F= FU where F(vx.Si) consists of v = k\ 
independent rowsj)f_A/'Li. Ah<)_L'Ji = FP , where P is a kjxr matrix of rank ki. Then 
K[H = F'L[H=U' F' FP; since U' F' F is nonsingular, K^' H is also of rank ki. 



called reducible restraints since it is then possible 
(as was just shown) to obtain pre-estimable restraints 
from them. Unless otherwise indicated the given 
restraints in this section will be denoted by 



K[I3 



mi 



7712 



where Ki is p X ki and has rank ki . Furthermore the 
restraints K'^ are a set of k] irreducible nonpre- 
estimable restraints and K!^^ denotes a set of fe pre- 
estimable restraints; i.e., H'K2 = 0. Since ki and r 
are the ranks of H'Ki and H respectively, we must 
have ki ^ r. 

Theorem 2. Let X, y8, and Y be as before, satis- 
fying 

E(Y) = X'^,varY = o-2L 

Also let there be given known linear restraints among 
the parameters of the form 

K[l3 = mi,K!2f3 = m2 

where Kj is p X kj with rank k^ and m^ is kj X 1. The 
ki restraints Kl^ = mi are irreducible and nonpre- 
estimable whereas the k2 restraints K2I3 = m2 are pre- 
estimable. With H as before, let H=[Ho, Hi] corre- 
spond to a partition such that Ho is p X (r — ki) and Hi 
is p X ki where det HJKi 7^ 0. Then the minimum vari- 
ance linear unbiased estimate of the estimable function 
e = Tf3 is 

^^ = /'{CXY+Hi(Ki'Hi)-imi 

+ CK2(K2'CK2)-Hm2-K2'CXY)} 

where the matrix C is obtained from Lemma 2 with K 
taken to be K = [Kq, Ki] and Ko(pX(r — ki)) chosen 
such that det H'K 7^ 0. 

Proof. 14^ A partition H = [Ho, Hi] with the de- 
sired properties can be formed by taking H\ to con- 
sist of ki rows of H' in the same positions as ki lin- 
early independent rows of H'Ki. We first show that 
the unbiased linear estimates of = Tp are precisely 
the linear forms 



for which 



g{Y) = d'Y-hd;mi^dim2, 
Xd-\-Kidi-\-K2d2 = l, 



(4.1) 
(4.2) 



where c? is an ^ X 1 vector and di is a Aif X 1 vector. 
Thus 6 is estimable if and only if (4.2) has a solution 
[d' , du o?2]. The proof is based on the observation 
that 

Z' = [Y\ m[, m!2] 

defines an (n-h A;i + ^2) Xl random vector Z (recall 



'''^ We point out in advance that rearranging the columns of// and/or K does not alter the 
properties required of H and K (i.e., (2.2)). 
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that a constant is a special case of a random variable), Second, the pre-estimability of K!^I3 (i.e.; the fact 
and that H'K2 = 0) imphes that K> = XB for some nXko matrix 

B. Combining these observations gives (for any d?) 
EiY) = X'p, K;/3 = mu Kip = m2 

l''-K2d2=X{d^B82-Bd2) = Xd {d = 3 + B62-Bd2) 
are equivalent to E{Z) = [X, K\, KzY/S. Thus the asser- 
tion is proved by the proof of (3.4), with Z replacing as desired. 

Y, [X, K\, K2] replacing X, and [d' , c?!, g?2] replacing cf'. The choices of g?2 (now unrestrained) whicli mini- 

The nonsingularity of H[K\ can be used to solve mize (J* are known by the Gauss theorem to be pre- 
(4.2) for d\ after premultiplying it by H\. The result cisely the vectors 
is 



d, = {H[K,)-'H[l, 

so that the unbiased linear estimates of d^Tp are 
precisely the linear forms 

g(Y} = d'Y+ d!2m2^ l'H,{K[H,)-'m, (4.3) 

for which 

Xd-^ K2d2= [I - KAH\K,)-'H\]l = l"". (4.4) 

Thus 6 is estimable if and only if (4.4) has a solution 
[d\d!2]. 



^2-c*z*y*+(/-c*^*)z, 



(4.7) 



where z is an arbitrary /[:2 X 1 vector and C* is related 
to 

^* =Z*(^*)' = K!2CXX'CK2 = K^CK2 (4.8) 

as C is to A. We shall however show below that 
/ — ^*C* = 0, so that A"^ is nonsingular and the solu- 
tion becomes uniquely 



d2 = {A'')-'X''Y'' = mCK-2)~'K!2CL 



(4.9) 



Substitution of (4.6) and (4.9) into (4.3) gives the best 
Since war [g(Y)]=w3ir{d'Y) = a^d'd, as before (in estimate ^^ as asserted in the statement of the theorem. 

Since K2 has linearly independent columns, we can 
prove I^A'^C'^ by showing that 



section 3) the objective is to minimize d'd, but now the 
side condition on d is the existence of a 6/2 related to 
d by (4.4). Initially regard c?2 as fixed; then the min- 
imization of d'd subject to Xd = l'^ — K2d2 is identical 
with the problem of finding a best estimate for 
{l'' — K2d2)'^ subject to (3.1). By the Gauss theorem 
of section 3, the uni(|ue solution (as a function of r/2) is 

d = X'C{l''-K2d2) = X'C{\I-KAH;K,)-'H;]l-K2d2}. 

(4.5) 

In (4.5) the matrix C is obtained from Lemma 2 for 
some appropriate K; choosing ^'^ K as in the statement of 
the theorem yields CKi=0 (since C is symmetric and 
(2.3a) holds); so that (4.5) simphfies to 



(/-^*C*)K = 0. 

For this purpose write K2=XB as above, and use eq 
(2.6a) to obtain 

K!2=B'{xx-xy=B'x\x yx'=K!2{X'cyx'=x^x'. 

Then the version /4*jC*A'*=Z* of (2.7) gives 

(/-^*C*)/^^ = (/-/^*C*)Z*A^'-0 

as desired, completing the proof of the theorem. We 
shall frequently use the consequence 



d = X'Cl-X'CK2d2. 



(4.6) 



ACK2 = K2 



(4.10) 



This simplification was the purpose for choosing the 
indicated form K= [Kq, Ki]. 

Let Y''=X'Cl and Z* = /^^CX Then the quad- 
ratic form d'd to be minimized becomes, by (4.6), 

Q^ = (y* - (Z*)'^2)'(F* - (Z*)'rf2), 

and the condition on d2 is that it be related to some 
d by (4.4). This condition is, however, automatically 
satisfied for any dz, which can be seen as follows. 
First, the estimabihty of imphes that /* can be written 
in at least one way in the form (4.4), say 

l^=Xh^K2b2. 



o{K2=XBainAACX=X. 

Corollary 2.1: The parametric function 6 = 1' (3 is 
estimable if and only if 



Ho'[I-K,(H/K,)-^H/]/ = 0. 



(4.11) 



Furthermore if Ho is chosen such that Ho'Ki^O, then 
the condition reduces to 



Ho' 1 = 



(4.12) 



Proof. The necessary and sufficient condition 
for to be estimable was shown to be (4.4), i.e.. 



'•^ To show that such a choice is possible in at least one way, select anv^ X {r~-ki) matrix 
Ko of rank r~k, such that H[K„ = and AKo = 0. If H„ = K„ held, tlien (2.2a) w.mld he 
satisfied and 

would have nonsingular square blocks /(,'//o and /^'^i on its main diafional, implying the 
desired relation det (H'K ) ¥" 0. Since by Lemma 1 this relation (for fixed K ) is indej)endent 
of the particular choice of //, it persists even if //o / ^o- 



Xd + K.d; = [I-KAH,'Ki)-'H^']l 



(4.13) 



holds for some d and d-i. Since HqX = and H!,Ki = Q, 
eq (4.11) holds. Conversely if (4.11) holds, then 



H'[I-Ki{Hi'K,)-'H,']l = 
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744-268 0-65— 2 



which imphes that (4.13) holds for some d and c/2, can be used in the best estimate 6 = lp of any esti- 



which in turn means that d = l'0 is estimable. 

The only restrictions on H=[Ho, Hi] and 
K=[Ko, Kt] are that det H'Ky^O, H'A=0, 
det H\K\ 7^ and that both H and K have rank r=p—q . 
The matrix T/q can be chosen in any way subject to 
satisfying the above conditions. Ho can alway^s be 
taken to satisfy H^Ki^O by taking an initial Ho for 
which the above conditions hold and letting 

Ho = [I-Hi{K,'Hir'K,']Ho. 

It is easy to verify that HqA = 0, and further that 
H'oKi = 0. By Lemma 1, if the matrix H=[Ho,Hi] 
has rank r, then det (H'K) 7^ 0. To prove that H has 
the required rank r, we write the formula for Ho as 

Ho = Ho- H,P [P = {K[H,)-'K[Ho) 

and observe that 

[//o,//i] = [//o,//i]r / 0" 
p I 

where the first factor on the right-hand side has rank 
r while the second is rXr and nonsingular. 

Note that the previous construction did not depend 
on Kq. We now show, in addition, that ^0 can be so 
chosen that H[Ko = 0. Simply replace an initial 
^0 by 

Ko=[I-Kr{H[Kir^H[]Ko; 

then H[Ko = 0, and H'qKo has the required rank r — ki 
since it coincides with Hq'Ko. 

For simplicity we shall assume in what follows that 
Ho and Ko are chosen so that both //^i = and H[Ko=0. 
Thus the estimability condition is given by //((/ = 0, and 



mable function d= I' (3. The same holds for the unique 
solutions of each of the systems 



K', 

k; 



K„ K, K2 









K^ 

a+HoK;,+h,ki Ko 

K^ 





p~ 




xy 




f^o 




mo 




/^l 


= 


mi 




_ k 




m2 



(4.17) 



"A 


K2" 








XY 


Ko' 
K,' 








'fi 




m,) 
mi 


_K2 







>J 




m-) 



(4.18) 



XY + Homo + Himi 



m2 



(4.19) 



as well as the vector 

P=fu^ CK2(K^CK2)-Hm2 - K;CXY) (4.20) 
where l^o is obtained from the unique solution of 



K' 



Ko Ki 





k; 





r/^o] 




"XY 




Mo 


= 


mo 




LmJ 




mi_ 



(4.21) 



Proof. System (4.16) does not have a unique solu- 
the frequently occurring in\erse (H'K)-^ takes the ^ion for ki < r, but for any solution [P \ \'] we can 

define a vector mo by Koft = mo and observe that 
[P ', y] satisfies (4.18). Thus the discussion of (4.16) 
(4.14) reduces to that of (4.18). 

Since the P^ of (4.15) clearly obeys KlP = mi, and also 



simple form 



Then the general form of the vector fi , such that the 
best estimate of every estimable I'fi is I'fi , is given by 



/3 =CXY + H,iKiHn)-'mo + H^{K[H,r'm, 

+ CK-2{K'2CK.)-\m-, - K'£XY), 



(4.15) 



where mo is an arbitrary {r — k\) X 1 vector. 

The next corollary formulates some systems of 
equations involving "dummy" variables (/Ho, Ad, k) 
and artificial restraints {K'o^ = mo) which can be used 
to solve for p of (4.15). 

Corollary 2.2: Let /jlo and mo be (r — ki)Xl vec- 
tors, jX\ a ki X 1 vector and \ a k2 X 1 vector. Then 
every solution P of the system 



(premultiply p by CA) satisfies 

C{A^^K2k-XY} = Q 
where 

\ = {K'2CK,)-\K',CXY-m2). 



(4.22) 



we find that /3 and X obey (4.18). Thus the solution 
of (4.18), once it is proved unique, must have the form 
(4.15). 
The first subsystem 

AP ^Kotio-^Knjii^K2fJi2=XY 



A 


K2' 




fi 




XY 


k; 







, \ _ 




mr 


K,' 











m2 



of (4.17), when premuhiphed by [H'^o)~^Ho, yields 
(4.16) Mo = 0; then premultiplication of 

A^-^K^^Ji, + K2^Ji2=XY 

hy {H'iK\)~^H[ yields /xi=0. Thus every solution of 
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(4.17) is a solution of (4.18), so (4.17) does not require 
further discussion. Note that the dummy variables 
fji{) and fXi are zero vectors in the solution. 

It is trivial to check that any solution of (4.18) is 
also a solution of (4.19). Thus the results for (4.16) 
through (4.19) will be proved once we show that (4.19) 
has a unique solution. The utility of (4.19) is that it 
is a smaller system than those preceding it. We 
write the first subsystem of (4.19) in the form 

{A + HK')P =XY-^Hm-K2\ 

where m'={mQ, m[). By Corollary 1.2, A-\-HK' is 
nonsingular so that p is directly determined in terms 
of X by 

P ={A^HK')-\XY+Hm)-{A^HK')-'Kzk. 

Corollary 1.2 shows that the first term on the right is 
just /3o, while since H'Ki = the formula (3.12) for 
(A-\-HK')-^ shows that we have 



P =Po-CK2k. 



(4.23) 



After premultiplying by (K[fiK^i)-^K[^, noting K[^P = rriz, 
we obtain a unique k. 
To treat (4.20) we first use (4.14) to obtain 

HiK'H)- ' = [Ho{K'oHo)-\ H,{K\H,)- '] 

and then apply corollary 1.1 (see (3.11a)) to the system 
(4.21) to show that its unique solution has 



/3 = CXY -h Ho{KiJi,)m, + H^(K[H,)-'m, . (4.24) 

Thus /3 given by (4.20) coincides with (4.15). 

Corollary 2.3. With the particular choice 

for any = 1'!^ there is a unique ^^'^ estimable 6\ = l[fS\ 
given by /, = [AC + Ki(H;Ki)-^H;]/ such that for all 
possible mi, m> and /3,/' p is the best estimate of 6\. 

Proof. First assume U = [AC^ Km\Kx)-'H\]l 
and 0\ = r\f^. Then //o/i=0, so 6\ is estimable, and 
we have 

///3 =l'[_CA^HmHx)-'K[]^ =1'^ 

by direct calculation (using CAC = C, AHi = 0, K[C = 0) 
so that l'^ is the best estimate of ^i. To prove unique- 
ness, consider any estimable 6i = l[p such that I'p is 
the best estimate of Oi for all mi and m2. Note with 
the aid of (2.2a), that det {H[Ki) t^ imphes that [A, K,] 
has rank q + ki. Since H'q\^A,Kx] =0 and //o has rank 
p — {q^kx), it follows from H'oli=0 that li=Ad-\- Kidi 
for some vectors d and di. Using ACX = X and 
ACKi^Ki we obtain 

/;/8 =d'[{l-K2{K'2CK.)-^K'2C)XY 

^K2{K'2CK2)-'nv>\^-d{m^. 



■'''■ '\\\f uniqueness assertion requires the assumption mentioned in footnote 9. 



Setting mi = K\f3 (i= 1, 2) and equating the coefficients 
of Y and (3 in /i'/3 and Ip , we obtain 

X'il-CMK^zCK^Y'Km-Cl) = 0, (4.25) 

K2mCK2)-'K!2 (d-Cl) = K,[{H[Ki)-'H[l-di] (4.26) 

Multiplication of the second equation by //,' h^ads to 
H[l = H[Kidi and thus to 

dt = (H[K,)-'H[l 

as desired. Substitution of this into (4.26) yields a 
result which when substituted into (4.25) gives 

X'id-Cl) = 

implying that Ad = ACl as desired. 

We turn now to the residual vector b = Y—X'^ 
and the residual sum of squares S^ = 8'8. 

(Corollary 2.4. The residual sum of squares can be 
written as 

S2 = §'a = (Y-X'/3o)'(Y-X'/3o) + \'(K2'CK2)X (4.27) 

and has the expected value 

E(S^) = (n-q^-k2)o■^ 

where 00 is the estimate ignoring the preestimable 
restraint K2P = nvy and 

X = (K2'CK2)-HK2'CXY-m2). 

Proof. The residual sum of squares can be 
written 

S'' = b'^ = (Y-X'0,,^X'CK2ky(Y-X'Po + X'CK2k) 

={Y-X' ,^'(Y-X'p ,^)^ k'(K^CK^2)k^2{Y-X'0 o)'X'CK2k, 

However we have 

{Y-X'p oyX'CK2k = {Y'X'C - P [AC)K2k 

= {Y'X'C - Y'X'CAC)K2k = 

and thus 

S' = {Y-X'p,;}'{Y-X'lU^)-^k'{K2'CK2)k. 
From Corollary 1.4 of section 3 we have 

E{{Y~X'p,y{Y-X'p,)]=(n-q)cT\ 

Furthermore 

E(k) = {K^CK.^-HK:,^ - m-i) = 0, var X = [K'fiK.2)-'&\ 

Making use of the formula for finding the expectation 
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of a quadratic form ^^ gives 

E{yK^CK2k)=^ trace {{K'fiK^Xyar \)} 

= trace {{K'fiK2WfiK2)-'cT^} = feo-^, 

and thus the result is proved. Note that the formula 
for S^ is composed of two parts, the second of which 
measures the deviation between the values of the pre- 
estimable restraints K'zfi as estimated from the data 
(F), and the given values m2 of these restraints. 

Corollary 2.5. The residual vector 8 = Y — X'/3 
is uncorrelated with any estimable function; in fact 



Cov(8, P) = 0. 



(4.28) 



where L i5 a p X k matrix of known constants and m is 
a k X 1 vector of known constants. The minimum 
variance unbiased linear estimate of any estimable 
linear function 6 = 1' ji is 6 =1'^ , where j§ (independent 
of I) is given by 

p =(I-LL-)'C(I-LL-)XY 

+ [I-(I-LL-)'C(I-LL-)A](L-)'m (4.33) 

and C is related to A = (I — LL~)A(I — LL~)' as C is 
to A. 

Proof. We first show that the unbiased linear 

estimates of 6 = I' (3 are precisely the linear forms 



Proof. We write 8o = Y—X'Po as the residual 
vector if the restraints /^2/3 = m2 had been ignored. 
By (4.20) d = do-\-X'CK2k and P =^o-CK2\ and we 
can write 

Gov (8, p) = cov (8o, i3o)-cov (So, \)K^C 

-hZ'C/^2 cov (X, Po) -[E(k\')]K!,C, (4.29) 

From Corollary 1.4 of section 3 we have 
cov(8o, ^ o) = 0. For the second term in (4.29) we 
calculate 

cov(8o, k) = cov(8o ,KiPo- m2){KiCK2)- ' (4.30) 

- cov(8o J o)i^2(/^2Ci^2)-i = 0. 

For the third term we calculate 

coy{K Po)-[E{kk')]K'£ (4.31) 

= mCK2)-' coy{KiP - m2 J o)- (var kW^C 

= {K'^CK2)-^{K[, wsltCP o) - K',C(T^} 

= mCK2)-HK', yariCXY) - K'ficr^} = 0. 

Substituting in (4.29) we obtain the desired result 
(4.28). 

It is possible to develop the extension of the Gauss 
theorem in a manner which leans more heavily on 
properties of the weak generalized inverse. However, 
the final form of the solution is not useful for practical 
applications. One possible advantage of this alterna- 
tive approach is that there is no need to make a 
distinction between pre-estimable and nonpre-estima- 
ble functions. These results are contained in the fol- 
lowing theorem. 

Theorem 3. Let X, fi and Y be as before, satisfying 
E(Y)-X'/3,var(Y) = o-2I 
and also the restraints 

L'/3 = m (4.32) 



for which 



g{Y) = d'Y+p'm 



Xd-\-Lp = l 



(4.34) 



(4.35) 



where d is an nXl vector and p is a A: X 1 vector. 
Thus 6 is estimable if and only if (4.35) has a solution 
{d, p). The proof is based on the observation that 



Z'=[Y',m'] 



(4.36) 



defines an {n-\-k)Xl random vector Z (recall that 
a constant is a special case of a random variable), 
and that (4.32) and E{Y)=X'I3 are equivalent to 
E{Z)=[X, L]'f3. Thus the assertion is proved by 
the proof of eq (3.4), with Z replacing Y,[X, L] replac- 
ing X, and [d\ p'] replacing d' , 

The variance of g{Y) given by (4.34) is (d'dja^, so 
that finding a best estimate of 6 is equivalent to mmi- 
mizing d'd by a proper choice of c?, subject to the condi- 
tion that there exist a p related to d by (4.35). The 
choice of such a p is immaterial (as long as one exists) 
since p appears in (4.34) only in the combination 

p'm = {Lpyp 

which by eq (4.35) is determined by d and /. If d is 
such that some p obeys (4.35), then by eq (2.6a) 

l-Xd = Lp = LL-Lp=LL-(l-Xd) 

with L~ any weak generalized inverse of L, so that 

{I-LL-)il-Xd) = (4.37) 

and a particular solution of (4.35) is p"^ = L~(l—Xd). 
Conversely if (4.37) is satisfied then p* provides a 
solution of eq (4.35) and we can take 



g{Y) = d\Y-X'{L-ym) + l'{L-)'m. 



(4.38) 



It has been shown that finding a best estimate of 
is equivalent to minimizing d'd subject to condition 

(4.37), which can be rewritten asXd=l with 



« If the column vector Z is such that EiZ) = 0, \ar Z = cr^X, then EiZ'AZ) = a^ tr AX. 



X = {I-LL-)X, l = (I-LL-)l. 
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This is analogous to the problem (treated in the proof 
of the Gauss theorem) of minimizing d'd subject to 
Xd — l, and so the unique solution is 



d = X-l = X\I-LL-)'C{I-LL-)l, 



(4.39) 



from which eq (4.33) follows by substitution into (4.38). 
Still another approach to the material of the section 
can be based on the random variable Z defined by 
eq (4.36). Namely, as regards the first and second 
moments with which least-squares theory is exclu- 
sively concerned, the model specified by E{Y)=X'P 
and L'p = m is equivalent to the model 



£'(Z)=[Z, Lj'jg, var(Z) = o-2 



In 





:j 



(4.40) 



where In is the nXn identity matrix and Ok is the kX k 
zero matrix. Thus a model with linear restraints is 
equivalent to a "restraintless" model which however 
involves a singular variance-covariance matrix. 
Least-squares estimation in such models is discussed 
in the next section. 

5. Gauss Theorem With Arbitrary Variance- 
Covariance Matrix 

The results of the previous sections were derived 
assuming that the vector of random variables 
F' = (Fi, F2, . . •, F„) were uncorrelated and had com- 
mon variance; i.e., var Y=a^I. This section con- 
siders some ramifications when var Y = g-W where V 
is a known nXn matrix with rank m {m'^ n). The 
case when m = n has been investigated by Aitken 
[1937]. His result is generalized to include the pos- 
sibility of a singular variance-covariance matrix. 

5.1. Preliminaries 

Before discussing the extension of Aitken's results 
it will be convenient to record the implications of hav- 
ing a singular variance-covariance matrix. When V is 
singular with rank m (m< n), then there will exist a 
71 X 5 {s = n — m) matrix F with rank s such that F'V=0. 
However, this also implies that the 5 components of 
FT have \ar F'Y = {F'VF)(t^ = which is equivalent to 
F'Y being equal to a constant. ^^ Since E{Y) = X'P, 
we have as the value of this constant 



F'Y=F'E{Y) = F'X'P, 



(5.1) 



Then the distribution of F=(Fi,F2, . . ., Yn) is 
singular and can be reduced to a distribution in m 
random variables. In most applications when (5.1) 
holds we generally have F'X' =0. However, it is 
quite possible that F'X' 7^ 0. In order to discuss this 
more general problem, we write F = (Fi, F2) in parti- 
tioned form where F, is nXsi with rank st (t= 1, 2) and 
Si-\-S2 = s. Furthermore we have 

Fi'X'=0 (5.2) 

rank F2'^' =52 (52 < p). 



Note that (5.2) combined with (5.1) results in 
F,T=0 
F2'X'I3 = F2'Y. 



(5.3) 



That is, there are 5i independent linear relations 
among (Fi, F2, . • ., Yn) and 52 restraints among the /3 
which are preestimable by virtue oi H'{XF2) = 0. 

Another preliminary aspect of the problem is the 
existence of an nXn orthogonal matrix P such that 



P'VP = 



(5.4) 



where A is the mXm diagonal matrix whose elements 
are the m nonzero characteristic roots of the sym- 
metric positive semidefinite matrix V. Let G be a 
nXm matrix such that the columns of G are the m 
(normalized) characteristic vectors of V\ i.e., 

VG = G\, G'G = I 

Then the orthogonal matrix P m (5.4) can be taken to 
be 

P=[F.G] (5.5) 

where F is the nXs matrix mentioned previously, 
chosen (as is possible) so that F'F = I and G'F = 0. 
By virtue of this partition we have 

V = GAG'. (5.6) 

We also note that F+ is given by 

V^ = G\-'G'. 

The necessary four properties (2.1a-d) follow from 

V^V=GA-'G'GAG' = GG' 

as G'G = I. 

A frequently occurring case is when V^ — cV where 
c is a scalar. Then it can readily be verified that 
the generalized inverse of F is V'^ = c~W. 

Also there will be need for writing the matrix F^ as 



V^ = TT', T=G\ 



-1/2 



(5.7) 



'^The qualifying phrase "with probability one" should be added but we omit such 
distinctions. 



where A~^/^ denotes the matrix obtained from A by 
replacing the diagonal terms by the reciprocals of 
their positive square roots. Note also that T'VT = L 

5.2. Arbitrary Variance-Covariance Matrix 

In this subsection we give some of the main results 
associated with an arbitrary variance-covariance 
matrix. The notation used will correspond to that of 
the preceding sections. 

Theorem 4. Consider the vector of random varia- 
bles Y having E(Y) = X'/3, var (Y) = o-^V where V is an 
n X n symmetric positive semidefinite matrix with rank 
m (m =^ n). Then the minimum variance linear un- 
biased estimate of 6 = l'P coincides with its best 
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estimate found from the model 

£(F) = r^, varF=o-2/. (5.8) 

k'l3 = m 

where X = XT, Y = T'Y, K = XF2 and m = F^'Y. Thus 
if F> is null, ft can be chosen as any solution of the 
normal equations 



(XV+X')/3=XV+Y. 



(5.9) 



Proof. As in the proof of the Gauss theorem of 
section 3, obtaining a best estimate A'F+c of is 
equivalent to choosing an ^ X 1 vector A, subject to 
XA = /, so as to minimize 

var(AT) = (A'FA)o"2. 



On the other hand, from the beginning of the proof of 
Theorem 3 we see that finding a best estimate in the 
model (5.8) is equivalent to choosing a pair [cf' , p' ], 
where d is an tti X 1 vector and p an 52 X 1 vector, so 
as to minimize d' d subject to 



I.e., 



Xd-^Kp = l, 
X{Td-^F2p) = L 



Thus the theorem will be proved if we show how to 
associate to each vector A a pair [d' , p'], and to each 
pair [d' , p'] a vector A, such that in each case 



XA = X{Td^F2p\ A'VA = d'd. 



(*) 



Given [0?', p'] we simply set A = Td-\-F2p; the second 
relation in (*) then follows from F'V=0 and T'VT = L 
Given A, we employ the orthogonal matrix 

P=[f,G] = [F,,f2,G] 

to define an n X 1 vector (and thus define d and p) by 

[p;,p',G?'A-i/2]'=p-iA; 

then we have 

A = P[p;^ p', d'A-'l^]'=F,p,-^F2p + Td 

The first requirement of (*) is satisfied because 
ZFi = 0, and the second for the same reasons as 
above. 

Finally, that the normal equations corresponding 
to the first line of (5.8) are given by (5.9) follows from 
substitution for the tilde quantities, together with 
Tf = y+^ 

Corollary 4.1 (Aitken). // V is non-singular then 
the best estimate of any estimable 6 = I' /Sis given by 
— I'p where p is any solution of the normal equations 



Proof. When V is nonsingular, the matrix F is 
null and V^ = V'^, so the jesult follows from (5.9). 

Corollary 4.2. Let X = XT have rank q. Then 
the minimum variance linear unbiased estimate of an 
estimable function is 6 = Ip where 



p = CXV+Y 

+ CK(K'CK)- 



Km - K'CXV+Y)+ H(K'H) -^mo. 



{XV-'X')P =XV-'Y. 



(5.10) 



The matrix C is related to A = XX' = XV+X' and K 
by Lemmas 1 and 2; K is a p X r (r = p — q) matrix of 
rank r such that det H'K 7^ 0, and mo is an arbitrary 
r X 1 vector. 

Proof. Since X = XT has rank ^, H has the same 
relation to X as to X. Because H'K = 0, the restraints 
K'^ = m are pre-estimable in the model (5.8). The re- 
sult follows from (4.15) upon noting jhat here H\ and 
K\ are null, while Ho = H and Ko = K. 

Corollary 4.3. IfX has rank q, then the quantity 

S2 = (Y - t'PoYiY - X'^o) + X'(K'CK)X, 
in which 

)8o = CXY = CXV+Y 
\-(K'CK)-HK'CXY-m), 
has expectation 

E(S2) = (m-q-hs2)cr2. 



Proof. This corollary is an application of corol- 
lary 2.4 of section 4 to the model (5.8). Note that X 
is p X 771, which is why m appears in place of n in the 
formula for E(S^). 

The first two sentences of the proof of corollary 4.2 
show that if X=XT has rank q, then the class of pre- 
estimable functions is not reduced in passing to the 
model (5.8). However if X has rank q{q < q), then in 
passing to (5.8) H is replaced by a. pXr matrix H of 
rank f = p — q such that H'X = 0. Such a matrix can 
be obtained as H = [Ho, H] where Ho is an appropriate 
p X iq — q) matrix. 

It is desirable to have a system of equations for /3 
(in Theorem 4) when F2 is not null. Such systems can 
be obtained (and other information derived) by applying 
the material of section 4 to the model (5.8). In doing 
so^ it should be kept in mind that the restraints 
K'ft =m must be separated into those whicti are pre- 
estimable (this is the sole class when q = q, as already 
noted), and those which are not; the latter must be 
examined for irreducibility (see the paragraph preced- 
ing Theorem 2) and "reduced" if necessary. 

It is natural, as a next step, to consider a model 
which involves both the complications of linear 
restraints on the /3 and an arbitrary variance-covariance 
matrix Fcr^. This requires no new extension of the 
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theory, since the only addition is that of the restraints 
K\p=m\ (i = \, 2) where A^2i8 represent pre-estimable 
restraints. It is quite possible that s(mie of the 
restraints K' ^ coincide with the restraints K' f^ in which 
case the duplicate restraints in K'fi (or K'fi) may be 
dropped. Aside from this duphcation one will then 
have the situation 

£(F)-Z'/3, var Y = (tW [VhdiS rank m[m^n)) 

K;f3 = mi{i = l,2) 

which is identical for purposes of estimating = 1' (3 
with 



E{Y)=X'I3, varY=(T'I 



(5.11) 



where the tilde (~) quantities are defined as in Theorem 
4. A formal proof is obtained by applying Theorem 4 
to 



£(F*) = (A^*)'^, var y* = FV2 



^hich 



~V 0" 

F*= 

_0 ()_ 

Suppose for example that X is of full rank (i.t^., q = p) 
and that V is nonsingular (i.e., m = n), and consider 
the model ^''^^ 



where C is related to A=XX' ^XV'^^X' as C is to A, 
and where /^o is the unique solution of Al3o = XY. 
Since A is pXp nonsingular, we have 

C=A-' = {XV-'X')-\ 
l3o = CXY={XV-'X')-'XV-'Y, 

so that substitution in (5.1 Id) yields 

p ={XV-'Xy'{XV-'Y 

-i- K[K'{XV-'X')-'K]-'{m- K'iXV-'X')-'XV~'Y)}, 

in agreement with the result obtained for this special 
case by Chipman and Rao [1964]. 



5.3. Simplification of the Normal Equations 

In the model 

E{Y) = X'f3, var F=o-2F, 

we will have XF = (so that the normal equations are 
given by (5.9)) if and only li X = XGG' . For, if this con- 
dition holds then 

XF = {XG)(G'F) = 0, 

while d XF = then JSf = MG' for some p X m matrix M 
(i.e., the rows of X are linear combinations of the 
orthonormalized characteristic vectors of V), and 
post multiplication by G yields M = XG. 
In particular, tbis will be the case if 



E{Y)=X'P,yeir{Y) = (TW 
K'f3 = m, 



(5.11a) 



XV^=BX, X = BXV 



(5.12) 



where K is a. p X k matrix of rank k and m is a A: X 1 
vector, both consisting of known constants. By the 
prescription given in the last paragraph, and from the 
fact that m = n implies that K in Theorem 4 is null, we 
see that an appropriate ft will be one for the model 

E{Y)=ri3, var {Y) = (t^I (5.11b) 

K'P = m (5.11c) 

where X = XT, Y=T'Y, and T=GX-''\ G is of rank 
m = n and so T is nXn nonsingular, implying (since 
q = p) that X is of rank p. Hence the constraints 
K'^ are all pre-estimable with respect to (5.11b), i.e. 
K = K2 in our previous notation. Applying (4.20), we 
see that we can take 

p =Po^CK{K'CK)-'{m-K'CXY) (5.11d) 



for some nonsingular pXp matrix B. For, the first 
condition in (5.12) yields X = MG' with M = B-'XG\-\ 
while the second yields it with M = BXGA. Fhe two 
conditions of (5.12) are logically equivalent, for the 
first implies 

X = XGG' = {XGA''G')(GAG') = (XV^)V = BXV 

while the second implies 

XV+ = BXVV^=BXGG'=BX. 

If (5.12) holds then the normal equations are 

{XV^X')P =XV^Y, 



and become 



(BXX')ft =BXY, 



■^" In the rest of this subsection, use of the symbol m both for the rank of V (here m = n), 
and for the k X 1 vector in (5.11a), should cause no confusion. 



which are equivalent to the usual normal equations 
Aft =XY obtained when V = I. This result seems to 
have been first noted by T. W. Anderson [1948], 
and MuUer and Watson [1959] have discussed it in 
the context of randomization theory. 
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For the rest of this section we assume XF = (i.e., 
X = XGG'), and ask when a simphfication of the normal 
equations something hke the one described above 
is possible. Note that X = XGG' implies q^m. If 
q = m we can partition 

X' = [X\,X^], i8'=[^;,/3^] 

where Xi is qXn oi rank q, Z2 is rXn, /3i is qXl and (32 
is rXl. The normal equations become 

{X ,v^x;)P ,+(X,v^x.i)p 2=x,v+Y, 

(XoV^X'dli , + (X.V+X'^P 2 = X-iV+Y. 

Since X=XGG' implies Xi=XiGG', X\G has rank 
q = m and hence 

X,V^=-XiGA-'G'=B,Xi, B, = iXiG)A-'{XiG)-' 

where Bi is qXq nonsingular. Premuhiplication of 
the first normal equations by fif' (after substitution 
of BiJV:, for Z,F+) yields 



iX,X[)p, + {X,X!,)fi2=X,Y 
{X.^[B[)i^ , + (X2V^X!,)fi2=X2V^Y. 



(5.13) 



Thus at least the first subsystem of the normal equa- 
tions has been somewhat simplified. 

If in particular p = q = m then we have (5.12) and 
the resulting full simplification. If p>q = m but 
Z1Z2 = 0, then the normal equations reduce to 



{X,X[)Pr^X,Y 

{X2V^X1)P2=X2V^Y, 



(5.14) 



and the solutions for P 1 are the same as if var Y=(t^I. 
Without assuming ^1^2 = 0, we can observe that 
X2=X[N for some qXr matrix A^ and that Ai=XiX[ 
is qXq nonsingular; thus the first subsystem of (5.13) 
can be solved for P 1 as 

Pi=AT'X,Y-m2, 

and the second subsystem becomes 

N'{B, -A,B[A-,')A,NP 2 = N\B, -A ,BiA^')X,Y 

which is to be solved for ^ 2. If in addition JiL^2 has rank 
r (which requires r ^^), then A^ does too and one can 
first find the unique j8i such that A\^i—XiY^, and then 
satisfy the second subsystem by solving NP 2 = fii, i.e. 
P2 = {N'N)-W^u 

If q < m the situation is more complicated. This is 
illustrated by the following example (due to K. Gold- 
berg, NBS) in which p = q=l and n = m = 2. Take 
Z=[1,0] and 



G = G'- 2-1/2 



1 


1 


A = 


1 


1 


-1 












Then XV^=XGA-'G' = [312,-112] but BX has the 
form [t, 0] for all 1 X 1 matrices B; hence (5.12) or its 
SLYialogXiV^ = BiXi cannot hold. 

Even when q < m^ some simplification is possible 
if there is a partition G= [d, G2], with Gi (n X q, such 
that Z1G2 = 0. For then, if 

A = diag(Ai, A2) 

denotes the decomposition of A corresponding to the 
partition of G, we have 

Xi=XiGG =A^i(GiGi ~h G2G2)^^A.iGiGi, 

ZiF+ -Zi(Gi Ai^G; + G2A s^GD =XiG ,Ar'G[ 

and can mimic the procedure for q = m (up to and 
including (5.14)) using Gi and Ai instead of G and A. 
At present it is not clear what other cases admit anal- 
ogous simplification if q < m. One such situation 
arises if we change the dimensions of the partition 
of X' so that Xi is p, X n (pi +p2 =p), Pi is ptX 1, and 
Xi has rank pi (implying pi ^ q). If there is a parti- 
tion G=[Gi, G2], with Gi (pi X n), such that ZiG2^0, 
then the preceding analysis still applies. 

5.4. Equicorrelated Variables 

In many experimental situations the covariances 
between the observations are not zero, but to a reason- 
able degree of approximation may be regarded as 
being equal; i.e., cov (Yi, Yj) = pa^ (i^j)- Therefore 
we can write var Y= Vcr^ where 



F=(i-p)/+py 



(5.15) 



and J is an nXn matrix with all elements unity. 

The matrix V has the two distinct characteristic 
roots [l + (n— l)p] and (1 — p) with multiplicities one 
and (n—l) respectively. However since Vcr^ is a 
variance-covariance matrix, it is positive semidefinite 
and the roots are nonnegative. Consequently 

l-h(n-l)p^O, 1-p^O 

and we obtain the bounds —{n— l)~^ ^ p ^ 1. When 
either p =—{n—l)~^ or p = l, a characteristic root 
will be zero and V will be singular. If the Yi are in 
creasing linear functions of one another, p will be 
equal to unity. The case p = — {n—l)~^ implies that 

^ F/ = constant, V^ = {n-l)n-^{I -J/n}, and that 

i = l 

the sum of the elements in any row or column of V is 
zero. 

When p ^ 1 or p 9^ (n— 1)"^ V has an inverse which 
is given by 

F-' = (l-p)-'{/-p[l + («-l)p]-7}. 

Therefore using (5.10) the normal equations can be 
written 

{[A-p[l+{n-l)p]-'XJX'}ii=XY-p[l+(n-l)pPXJY. 

(5.16) 
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where A=XX' . The conditions for estimability of a 
parametric function only involve first moments and 
hence are not dependent on p. Therefore the func- 
tion 6 = I' 13 is estimable if and only if H'l = where 
H'X = 0. The solution of (5.16) involves knowledge 
of p. However we wish to determine the parametric 
functions which can be estimated without knowledge 
of p. 

Let 1 denote an zi X 1 vector of ones, so that J =1 1 '. 
Since 1 and therefore n~^l^l is a characteristic vector 
of V corresponding to the characteristic root 
[1 -\-{n—l)p], we can take the matrix G of subsection 
5.1 as G = [n-^in, M]. Here the n-l columns of M 
are characteristic vectors of V corresponding to the 
characteristic root (1 — p), and 

M'M = I,-u yM = 0, VM = a-p)M. 

Since G is square, G'G=I implies GG'=1 and therefore 

MM'=I-n-'J. 

Also, in the notation of subsection 5.1, 

T=G\-'i'=[n-'i\\+{R-i)pr'inA\-pr'im], 

so in applying Theorem 4 

X = XT=[n-'l\\ -f (/I- l)p)-'/2Zl, (1 -p)-'/2ZMJ, 
~n-'lH\^{n-l)pr'in'Y 

{i-p)-'im'Y 



exists an 7^X1 vector h such that h'X=^\, and (5.20) 
yields 

h'l = h'XMd^-h'Xle = ne, 



that 



e = eo — /I ^(I'h) 



Y^TY- 



The equation E[Y) = X'j3 of (5.8) therefore becomes 
equivalent to 

E(\'Y)=VX'P^ (5.17) 

E[Y) = X'I3 (X=XM,Y = M'Y) (5.18) 

and it is readily verified that 

var(F) = (l -p)o-2/, cov(lT,F) = 0. (5.19) 

The unbiased estimates of any estimable = 1' jS 
have the form 

g{Y) = d'Y + e{rY) 

where c? is an (^— 1) X 1 vector, e is a scalar, and 

Xd+Xle = L (5.20) 

Also, 

Yar[g{Y)]={l-p)a^d'd^e^n[l^{n-l)p]a' 

= {l - p)o-''[d'd^ e'n] -{- e'n'po-'. 

First suppose the rank of X = XM is less than the ,„ , . '. ~ . r . r u ^ i ■ , 

, r T/ T^l • 1 11 T'"! 1 A Simple sumcient condition tor the existence of such a vector /) IS that tlie coliitiiiis of 

rank q OI A. IhlS rank must be q~ L Ihen there ^ all sum to some nonzero constant ^. That is, l'A' = ytl'andthus wemaytake/i = A 'l. 



in every estimate g{Y).^^ Thus the minimization of 
var [g{Y)] subject to (5.20) is achieved by choosing d 
to minimize d'd subject to 

Xd = l— Xleo = I — Aheo. 

This, however, coincides with the problem pf finding 
a best estimate for {l — Aheo)'l3 in the model specified 
by (5.18) and (5.19); the Gauss theorem yields the 
solution as _ 

d'Y = (l-Aheoy0 

where f3 is any solution of the normal equations ob- 
tained using (5.18) and (5.19). Since MM' = I-n-'J, 
we find that these normal equations are 

X[I-n-'J]X'p = X\I-n-'J]Y. (5.21) 

Thus the best estimate g{Y) of 6 = 1' 13 is 

e = {l-Aheoy^-^l'Yeo 
= l'^-l'h{n-'h'A^) + l'h{n-'l'Y) 

where ft = ^^ n-'h(l'Y-l'X'^). Use of; (5.21) and 
h'X= 1' leads to AP =XY. Conversely if /3 is any so- 
lution of A0 =XY, then choosing ^=/l yields a solu- 
tion of (5.21), and also 

= {l-Aheoy^p-^ l'Yeo = l'P. 

Now assume X = XM has the same rank q as X. To 
minimize var [g(F)], first treat e as fixed; as in the pre- 
vious case we are led to the choice 

d'Y = {l-Xley0 

where 13 is any solution of (5.21). The rank hypothe- 
sis implies that die same //, and thus the same K and 
C, work for XX' as for A, and so we may choose 
= CXY. Now 

g{Y) = l'P^el'(Y-X'0) 

var[g(F)]-=var(/')8)-he^var [l'(F-Z'^)J 

+ 2ecov[/')8, l'(Y-X'P)]. 

The range of e in the remaining minimization prob- 
lem is that of all real numbers_^ To prove this, note 
that by the rank hypothesis l = Xdo for some (ai— l)Xl 
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vector do (i.e., each estimable form_ is also estimable e = l'[p — n-iCXJY + {n[np(l — p)-^ -f 1]+ I'X'CXl}" 
with respect to (5.18)), and also X = A'A'^ for some matrix 



A^, so that for any real number e 

l — Xle = Xd {d = do- 



-Nli 



as required by (5.20). The solution of the minimiza- 
tion problem is therefore 

e — GOV [/'/3, l'(y-Z'/3]/var [I'iY-X'^)] 

= {l-p)rCXll{n[l-{-(n-l)p]^{l-p)rX'CXl}. 

This is independent of p if and only if the numerator 
vanishes, i.e., 



l'CXl=0, 
and in that event the best estimate reduces to 
l'^ = I'CXil- n-U)Y= l'CXY= /'/3 



(5.22) 



(Y-X'/3+n-iX'CXJY)] 
with /3 as above. 

Corollary 5.1. Let the deviations Y* andX"^ be 
defined by Y* = MY = MM'Y, X* = XM'=XMM'. 
Then the quantity 

S2 = (Y*-X*'i8)'(Y*-X*'/3) 

has expectation 

E(S2) = (n-q)(l-p)o-2 

if a vector h exists for which h'X = 1'; otherwise the ex- 
pectation ofS^ is 

E(S2)-(n-q-l)(l-p)c72. 



where/3 is a solution of ^/3 -XF. Note that /'CZl =0 PROOF. The expectation of {¥ - X'^YiY - X'P) is 
will hold for all estimable functions if and only if (n — ^)(1 — p)cr^ if X has rank q—l (i.e., a vector h ex- 
ists for which h'X=^l). When X has rank q, the ex- 
pectation is (^ — ^— 1)(1 — p)or2. These results imme- 
diately follow by applying corollary 1.4 of the Gauss 
theorem. Since 

Y-X'ft=M'{Y-X'P) 

and {MM'f = MM' =I-n-'J, we have 



Z1=0. 

Before assembling these results (with a few more 
substitutions) into a formal theorem, we remark that 
XM has the same rank as 

XMM'X' =X{I-n-'J)X' =X{I-n-'JfX', 

and thus the same rank as the matrix 

Z*=J[/-Ai-7] 

obtained from X by simply taking deviations from the 
mean, i.e.. 



{Y-x'h'{y-x'^] 



■-{Y-X'^P)'MM'{Y-X'^P) 

= (F*-Z*'/3)'(F*-Z*'/3). 



-Xi; 



Xi ■ 



The problem arises as to what to do if there does not 
exist an h for which h'X= 1' (i.e., if XM has the same 
rank q as X), p is unknown, and we wish to estimate 
an estimable function for which I' CXI ¥=" 0. Estimates 
of 6 = TfB can be obtained if we are willing to consider 

Thus when X* and X have the same rank, a solution of the alternate estimation problem where 

(5.21) can be obtained as 



a=l 



^ = CX[I-n-'J]Y=0 -n-'CXJY. 
Theorem 5. Let 

E(Y)=X'I3, var (Y) = (72[(1 -p)l + pj], 

-(n-l)-i<p< 1. If the rankq"" o/X* =X[l-n-U] 
is q—l {i.e., there exists an nXl vector h such that 
h'X=l'), the normal equations are A^=XY and do 
not depend on p. When q* = q, the only estimable 
functions 6 = l'fi with best estimate independent of p 
are those with 



E{Y) = J '^, var y = (1 - p)(tH (5.23) 

subject to the restraint 

i'Z'/3=i'r, 

which must be pre-estimable. 

Application of (4.16) results in the normal equations 
X{I-n-'J)X'^ +Xl\=X{I-n-'J)Y, 
VX'^ = VY, 



/'CZ1=0, 



(5.22) 



which reduce to 



and for these the best estimate is I'P with Ap =XY. 
If {S. 22) does not hold, the best estimate of d is 



A^^X\\=XY, 
VX'^=\'Y. 
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After premultiplication by the nonzero 1 Xp vector 
l'X\ the first equation can be solved for X to obtain 

\ = {l'X'Xl)-n'X'{XY-A0), 

and then the normal equations for /3 alone are obtained 



[I-{l'X'Xl)-'XJX'W=[I-{lX'Xir'XJX']XY 

rx'^ = i'Y. 

Alternatively, we can apply Theorem 2 to the model 
given by (5.23) and the restraint 1'Z'/3=1T. Since 
X has the same rank as X, H has the same relation to 
X as to X. Thus, by Lemmas 1 and 2, the matrices 
K and C are the same for X as for X. Here H\ and K\ 
are null, while Hq and K^ correspond to H and K, re- 
spectively. The result of applying Theorem 2 is given 
next as another theorem; note that the estimate 6 = 1 p 
coincides with that given in Theorem 5 when l'CX\={). 

Theorem 6. For the model 

E(Y)-X'/3, var Y = ct%\ - p)\ ^- p]l 

where — (n — 1)~^ < p < 1, the parameter p is unknown, 
and ^ = XM has the same rank as X, the best estimate 
conditional on l'X'P=l'Y of the estimable function 
6 = l'P is given by 

§ = l'P =/'CX{(I-n-U) 

+ (1 'X'CXl)-iJ[I - X'CX(I - n-M)]} Y. (5.24) 
Corollary 6.1. The quantity 

S2_(Y*-(X*)7l )'(Y*-(X*)7l), 

where X* = X(I-n-U) and Y* = (I-n-M)Y, has the 
conditional expectation 

E(S2|rY) = (l-p)o-2(n-q). 

Proof. We first observe that 

S''={Y-X'py{Y-X'p), 

so that E{S'^\1'Y) can be found by applying Corollary 
2.4 to the model consisting of (5.23) and the restraint 
1'X'I3=1'Y. Here A:2 = l, n is replaced by n—l 
since X is pX(ri— 1), and cr^ is replaced by {l—p)a^. 
This proof also shows, by (4.27), that S^ can be written 
as 

S2=(F*-(Z*)'^o)'(F*-(Z*)/3o)+X'(l'^'CZl) 

where/3o=CZ(/-/i-V)y and 

k = {l'X'CXl)-'[rX'CX{I-n-'J)Y-l'Y]. 
5.5. Two Stage Least Squares 

An application of Theorem 4 arises in two stage 
least squares estimation which has recently been 
discussed by Freund, Vail, and Clunies-Ross (1961) 
and Goldberger and Jockems (1961). We shall con- 



sider some further generahzations and discuss the 
matter more fully. Consider the model 



E(Y)=Xi'pi-^X/l32, var Y=ct'I 



(5.25) 



where Xj are piXn matrices and ^/ are p, X 1 vectors 
for i=l, 2. Instead of considering the full model, 
in the first stage we ignore (32 variables and take 
£"(10—^1731. Then the normal equations will yield 
the solution 

i8i = CiZiF (5.26) 

where Ci is related to X^Xi' as C is to A. 
Define the residual vector 

8 = Y-X[p^ = {I-X[C^X^)Y 

and the idempotent matrix 

V=I-XIC^Xu 

Then we have 

E(d)=VX!>P2. 

var(8) = Fcr2, 

and these equations serve as the model for the second 
stage. Now apply Theorem 4 to this model: V=V-^ 
since V is idempotent, the analogs of ^ and F'=F[ 
are X2V and Xi respectively with Xi(X2V)' = since 
ZiF=0, and so the result is the equation 



with solution 



{X2VX!2)P2=X2V8=X2VY 



ft2 = C2X2Vd = C2X2VY, 



(5.27) 



(5.28) 



where C2 is related to X2FZ2 as is C to A. 

Suppose 6 = I'lPi + I2IB2 is estimable in the full model. 
Then (see (3.4)) there exists an nXl vector d such that 
Xid = li{i=l, 2), and so^i = /ij8i is estimable in the first- 
stage model. Its best estimate in that model is 



e^ = l[l3u 



and in the full model 



E{6,) = /;C,Z,(Z;/3, + X!>I3^>) = 0^+ l[C,X,X!2P2. (5.29) 

The procedure to be described involves adding a 
term to d\ to obtain an unbiased estimate di of &i. 
Clearly this will be possible only if 61 is in fact esti- 
mable in the full model. We therefore are led to 
determine what condition on the partition [X[, X2] 
will ensure that 6\ = l[^\ is estimable in the full model 
whenever 6 = Vf3 is. First suppose the partition has 
this property. Since the rows of X' ^ are estimable 
in the full model, the same must hold for the rows of 
X{^\ and thus for the rows of 

X'2P2=X'l3-X[^u 
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By (3.4) there is an nXn matrix B such that XiB = 
and X2B=Xz. X\B = ^ imphes that B=VBi for some 
piXn matrix Bi, and so ^2^51=^2. The last equa- 
tion shows that the rows of ^2)82 are estimable in the 
second-stage model, or equivalently (by Corollary 1.1) 

X2=X2VX!2CzX2. (5.30) 

Conversely, suppose (5.30) holds and that 
d = l[l3i^l2p2 

is any parametric function estimable in the full model. 
By (3.4), there exists an /i X 1 vector d such that 



Then 



Xid=li{i = l,2), 



X,{I-VX'2C2X2)d=^lu X2{I-VX!2C2X2)d = 

so that by (3.4) 0i = /I^i is estimable in the full model. 
Hence (5.30) is exactly the required condition on 
[Zi,Z2], and is assumed in what follows. 

An unbiased estimate of 61 in the full model can now 
be given as 

§1 = 81- l[C,X,X^2C2X2VY= Uh. 

Since 6 and 6\ are estimable in the full model, the 
same is true of 



so that 



%=i2l^2 = e-eu 



X\d2 — 0, X2d2 = I2 



for some n X 1 vector 6/2- From this and (5.30) it can 
be verified that 

62 = 12^2 

is an estimate (therefore the best estimate) of 62 in the 
second-stage model, and also an unbiased estimate of 
62 in the full model. 

It has been shown that an unbiased estimate of the 
estimable function 



is given by 



6 = l[li,^ 12^2 

e = 0i+d2 = l[Pi-\-l2P2, 



where 

/3 1 = C, [I^X,X!2C2X^2X[Ci]XiY- CiZiZ^C2Z2F, 

02 = C2X2[I-X[CiX^]Y. (5.31) 

The solutions (5.31) can be shown by substitution to 
satisfy the normal equations 



X,X[ 
X2X1 



x,x:, 

X2A-2 





p2_ 


= 


~xy 
X2Y 



of the full model, and so § is the minimum variance 
linear unbiased estimate of 6, 

For the same reason, Oi = l'Si is the best estimate 
of di = l'if^i in the full model. In terms of this model 
alone, the following result has been proved: If the 
portions of every estimable function which respectively 
involve the /3i and ^2 variables are separately estim- 
able, then the best estimate of each such function is 
simply the sum of the best estimates of its portions. 
In this sense the condition (5.30) can be regarded as a 
generalization of orthogonality (Z2XJ=0); in the or- 
thogonal case the normal eqs (5.27) of the second-stage 
model are simply 

{X2X'^^^2=X2Y 

in direct analogy to those of the first-stage model. 
Note also that (5.30) automatically holds if q=p (i.e., 
if A is nonsingular), since then every parametric func- 
tion /'/3, in particular /J/3i, is estimable. 

5.6. Restraints Subject to Uncertainty 

Occasionally situations arise in which the given 
restraints K'^ = m are themselves subject to varia- 
tion. Such may be the case when the value of K' fi 
is not known but prior information is available which 
can be summarized as a value of a random vector in 
with E{m) = K'l3 and with precision described by 
var (m) = VmO"'^- A circumstance where this may 
occur is when data are available from another source 
which is believed to be without bias or systematic 
error. 

Let EiY) = X'l3, var Y=o-U and let the k "given" 
restraints consist of unbiased estimates mi (i= 1, 2) of 
Ki^, where Kf is pXkf of rank hi, and m' = {m[, m!^) 
obeys var (m) — VmO-^- Further it is assumed that the 
restraints K[p are nonpre-estimable functions and K!^f3 
are pre-estimable functions with respect to the observa- 
tional equations E{Y)=X'I3. It is desired to perform 
estimation subject to the additional conditions 
K/^ =7711, i.e., to fit the new data so that the quantities 
K'I3 are exactly equal to in. We may assume without 
loss of generality that /^2^2^/ and that the restraints 
K[I3 are irreducible. 

It will be convenient to introduce the expression 
undisturbed to refer to those estimable functions 
6=l'f3 whose best estimate § = l'P is not altered by 
the requirement that P be chosen to satisfy K'P = rh. 
Not aU estimable functions are undisturbed in general; 
for example we have no freedom in choosing § when 
^ is a linear combination of the rows of K'^^. The 
subclass of the estimable functions, consisting of 
those which are undisturbed, is a matter of choice 
and its selection would presumably depend on the 
problem at hand, but it should not contain any non- 
zero linear combinations of the rows of K'^/B. (If for 
example there is skepticism concerning the prior 
information, then this subclass would chosen to in- 
clude, so far as possible, those functions for which a 
minimum variance estimate is of particular impor- 
tance.) The class of undisturbed functions may 
be chosen, of the maximum possible dimension. 
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as the class of all linear combinations of the rows of 
L'I3 where Lis a pXiq — kz) matrix of rank q — hz such 
that H'L = and [K2, L] has rank q. We may assume 
L'L = I without loss of generality. 

Because H'[Ki, L] = 0, and [K2, L\ has the same rank 
q as X, there exist a k2Xn matrix P2 and a {q — k2)Xn 
matrix P such that 



X = [K2,L] 



= K2P2-^LP. 



These matrices can be found exphcitly, in terms of 
the inverse A^~^ of the qXq nonsingular matrix 



N = 



K'2 
L 



[K2.L\ = 



as 



Pi 



--N- 



I K'2L 

UK2 I 

/^•il X. 
L 



Since E{m2) = K2f3, we find that E{Y) = X'I3 is equiva- 
lent to 

E{y) = X'p (5.32) 

where F-F-P^m. and X = LP = X-K2P2. Similar- 
ly, under the assumption co\ {Y,m2) = which is im- 
plicit in our situation, it follows that \ar {Y) = (t'^I is 
equivalent to 

yar {Y)=V(t\ (5.33) 

where V = I^P2ViPi and V2 is defined by 

var [fhz) = VzO-'^- 

Thus the original model E(Y)=X'I3, var Y = (t'I, ig- 
noring the restraint K'f3 = m,\s equivalent to the one 
given by (5.32) and (5.33). 

From the fact that equaUty holds throughout the 
sequence 

q = rank (X) = rank {K2P2 + LP) 

^ rank {K2P2) + rank (LP) 

^ rank {K2) + rank (L) = k2-^{q — k2) = q 

of inequahties, it follows in particular that X = LP has 
the same rank q — k2 as L. Therefore the class of 
functions estimable with respect to (5.32) consists of 
all linear combinations of the rows of L'f3. 

We next prove that an analog oj^H for (5.32) is given 
by H= [HAL-LL')K2]. Since H'X = 0, it suflfices to 
show that H has rank at least p — iq — k2) = r+k2; 
since 

H'{I-LU)K2 = 

and H has rank r, it suffices to show that {I — LL')K2 
has at least rank A:2. This however follows from the 
consequence 



/ 
-L'Kz 



= N-' 



K!2 



of the identity 

{I-LL')K=[K2JA 



I 
L'Ki 



From the irreducibility of the restraints K\P in the 
original model, we can deduce that the restraints K'ji^ 
where K= [^1, ^2], are nonestimable and irreducible 
with respect to (5.32). Namely, 



H'K = 



H'K, 



LL')Ki 







LL')K2 



{I-LL')K2 



can be shown to have rank ki -h fc. For this purpose, 
observe that H'Ki has rank ki so that the same holds 
for the first block column in H'K. Also, since 
{I — LL')K2 has rank A:2, the same is true of 

{(I-LL')K2}'{{I-LL')K2}=K^iI-LL')K2 

and thus of the second block column. The presence 
of the zero block then ensures the result. 

Theorem 4 can be applied to the model consisting 
of (5.32) and (5.33), to obtain a new model analogous to 
(5.8), and the restraints K'/S will remain nonestimable 
in this new model. Thus the conditions K'f^ =m can 
simply be adjoined to the normal equations of the new 
model without affecting the best estimates of the 
functions estimable in this new model . . . i.e. the 
linear combinations of the rows of L'f3. Thus, as 
desired, these linear forms have their best estimates 
''undisturbed" by requiring K'P = m. (Here K plays 
the role of A^o in Theorem 2.) If in particular V is 
nonsingular, then by Corollary 4.1 the normal equa- 
tions tecome 

{XV-'X')^=XV-'Y, 

It may also be appropriate to adjoin artificial non- 
pre-estimable restraints to secure a unique solution 
for p . 

The previous material also permits us to arrive at 
unbiased estimates, consistent with K'l^ =m, of func- 
tions = 1'^ which are estimable in the original model 
but are not linear combinations of the rows of L'j8. 
From 

l = ACl = {K2P2 + LP)X'Cl 

it follows that 

= rCXP!,K!,f3 -h I'CXP'L'fS, 

so that an unbiased estimate is 

s=rcxp:,rh2-\-i'cxp'L'p =i'cx{P'2K'2^p'U)^ =rp 

with /3 as in the last paragraph.^ Note that although 
the second summand {I'CXP'U^) in d is the best 
estimate of the second summand of ^, ^ as a whole 
does not coincide with the best estimate of 6 in the 
original model since /3 comes from a set of normal 



171 



equations other than Ap =XY. Thus has been 
disturbed. 

The previous material takes an especially simple 
form when L'K2 = 0, i.e., when the estimable functions 
whose minimum-variance estimation is to be empha- 
sized (L'/3) are orthogonal to those whose estimates 
are prescribed {K2I3). Here premultiphcation of 

X = K2P2-^LP 

by K2 and L', respectively, shows that P2 = K'iX and 
P = L'X. Thus li=[H, /C2], and the model (5.32) 
and (5.33) becomes 

£(F) = r^, var(F)-ro-2 
with _ _ 

Y = Y-X'K2m2,X = LL'X, 

V=I + X'K2V2K'^. 
For a simple but artificial example, suppose 



X-- 



,K--K2 = 



,L 



i.e., estimation of the second component ^2 of j8 is of 
principal importance. Suppose also that 

var(m2) = V2(J^ = T^a^, 

so that T indicates the relative precision of the prior 
information relative to the new measurements under 
discussion. The previous paragraph applies, and we 
are led to the model 



with 



£(y) = Z')8,var(F) = Fo-2 



Y = 



]■ 



yi — m2 



x= 


ro 


0] 




.0 


1. 


.2 


0" 

i_ 





This can be rewritten 

E{Y)=X'l3,yar{Y) = (7-'I 
with 

l = I,F' = ((y,-m2)(l+r2)-i/2,y2)'. 

The normal equations of the new model are {XX')p 
=XY, i.e., 

0/3 1 + 0/3 2 = 0(ri - m2){l + t2)-i/2 + 0y2, 

0/3 1 + /3 2 = 0(yi - /7i2)(l + t2)-i/2 + y2, 

to which we adjoin K'P = m2, i.e., ^ i = m2. Thus the 
result is 

/3' = (7712,72)', 

whereas without the requirement K'/3 = 7712 we would 
have 

/3' = (yi,y2)'. 



The estimate assigned to 6 = f^\-\- ^2-, which is not a 
linear combination of the rows of L'/3, is 

^ = /3i-f/32 = 77l2 + y2 

and has variance T^o"^-hcr^, whereas the best estimate 
of 6 in the original model is yiH-y2 with variance 2cr'^. 
Thus the requirement K'$ =1712 decreases or increases 
the variance of the estimate of according as r < 1 
or T> 1, i.e., according as the prior measurement of 
77i2 was more or less precise than the new measure- 
ment of yi. 
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